Methods in Ecology and Evolution — Latest Matching Preprints

1

Ecological connectivity modelling with WebAssembly

Southgate, A. J.; Redihough, J.

2026-07-09 ecology 10.64898/2026.07.08.737333 medRxiv

Top 0.1%

49.3%

Show abstract

Circuit theory has been successfully applied to ecological connectivity modelling, notably via the Circuitscape software, which is typically run locally on a laptop or via a server. For downstream geospatial web applications relying on connectivity analysis, backend infrastructure is required, which can be costly and require advanced data governance. Recent developments in WebAssembly now allow fast C++ or Rust code to be run directly in a sandboxed browser environment for edge computing. We present a WebAssembly/Rust toolset with a geospatial data pipeline and efficient edge-computing implementation of connectivity analysis. This approach may be useful for geospatial modelling software where rasters and memory footprint are small enough for the browser context. Our results show that as expected, Circuitscape solves 1000x1000 raster networks 1-2x faster, but requires further file writes. Accounting for total program runtime, our web implementation can be faster for the given context.

2

move2utils: a utility toolkit for the move2 ecosystem

Kranstauber, B.; Safi, K.; Scharf, A. K.

2026-07-10 ecology 10.64898/2026.07.07.736908 medRxiv

Top 0.1%

37.7%

Show abstract

O_LIStudying animal movement at the population scale requires a stable, modern software substrate. Within R, the legacy move package supplied that substrate for over a decade, but its sp/rgeos backbone has been retired. The successor package move2 deliberately confined its scope to the data class and core movebank API functions. C_LIO_LIThe analytical machinery of move, namely dynamic Brownian-bridge utilisation distributions, the directional bivariate-Gaussian variant, corridor segmentation, and along-track thinning, was left to port to the modern sf/terra stack. C_LIO_LIWe present move2utils, an R package that completes and complements that transition. move2utils provides move2-native ports of the move analytical functions, preserves the original C kernels where they exist, and replaces the deprecated spatial scaffolding around them. It additionally ports some of the legacy R-based code to faster C kernels to improve computational speed. move2utils also exposes novel outlier-detection methodology described in detail in a companion paper. C_LIO_LIThe package is open-source (GPL [≤] 3), is developed on the MPCDF GitLab and mirrored on GitHub for public installation, and ships with vignettes and a CI-tested check suite. We illustrate it with a worked example on real tracking data and synthetic datasets. C_LI

3

EcoMorph: Universal morphological trait quantification from natural language prompts for ecological research

Amoah, E. I.; Bunch, Z.; Thomas, H. M.; Patch, H. M.; Grozinger, C.

2026-07-12 bioinformatics 10.64898/2026.07.10.737871 medRxiv

Top 0.1%

26.5%

Show abstract

0.O_LIMorphological traits such as floral area and body size are fundamental to ecological research, serving as inputs for studies of pollinator-plant interactions, habitat quality, and biodiversity monitoring. However, accurately measuring these traits from images remains challenging, particularly in complex field conditions where existing tools exhibit reduced accuracy and limited generalizability across taxa. C_LIO_LIWe present EcoMorph, a modular morphological measurement system that leverages the Segment Anything Model 3 (SAM3) to quantify traits across diverse ecological contexts. Unlike task-specific segmentation models requiring domain-specific training data, SAM3s prompt-based architecture enables segmentation of arbitrary biological structures from natural-language prompts, using the same underlying model across flowers, insects, and other targets without retraining. From the resulting segmentations, EcoMorph extracts three classes of measurement: area, linear dimensions, and object counts. C_LIO_LIWe validated EcoMorph across two ecological scales. At the intermediate scale, EcoMorph-derived floral area agreed closely with manual ImageJ measurements (R2 = 0.935, n = 74) under simple-background conditions and (R2 = 0.928, n = 58) under complex-background conditions, with valid predictions for 95% of images. At the fine scale, EcoMorph-derived insect body area was strongly correlated with hand-measured intertegular distance (r = 0.810, n = 349), capturing body-size variation across species from the small Bombus impatiens to the large Xylocopa virginica. Object counts matched manual counts almost exactly for well-separated insects in an insect box (R2 = 0.9997, n = 12). C_LIO_LIBy combining prompt-based segmentation with modular measurement, EcoMorph enables high-throughput quantification of area, size, and abundance from heterogeneous image sources without taxon-specific training. This generality supports a broad range of ecological applications, including pollinator and plant trait research, biodiversity and abundance monitoring, and allometric biomass estimation. C_LI

4

InsectDCT: A generalized pipeline for detection, taxonomic classification, and tracking of insects in camera-trap recordings

Bjerge, K.; Wogram, S. F. A.; Serra-Marin, P. E.; Sakhiashvili, O.; Hoye, T. T.

2026-07-10 ecology 10.64898/2026.07.07.736939 medRxiv

Top 0.1%

22.6%

Show abstract

Automated monitoring of insect pollinators in natural environments with insect camera traps and trained deep learning algorithms provides novel data for insect ecological studies. However, efficient and accurate image recognition analysis of the recorded images or videos is challenging, particularly for images containing small insects against complex backgrounds with diverse vegetation communities. Even when insects can be detected in images, identifying their taxonomy remains difficult, particularly in footage with low image resolution, light conditions, and distances from the plants, and in cases where insects appear blurry or only partially visible. In this work, we present InsectDCT, an AI-based pipeline for automated detection, hierarchical classification, and tracking of insects in footage of natural vegetation tested in different environments. The InsectDCT pipeline consists of three levels: insect Detection and localization, hierarchical taxonomic Classification, and spatio-temporal Tracking. In the first stage, insects are detected in time-lapse images or video recordings using the You Only Look Once (YOLO11) object detection architecture. Detection performance is improved using motion-enhanced images, which improve robustness in cluttered and 3 dimensional environments. The detector is trained on an extensive dataset that contains more than 60,000 images collected using camera traps deployed across a wide range of plant families and floral habitats. In the second stage, detected insects are classified using a hierarchical taxonomy-aware classification framework that covers 80 taxonomic groups. Classification is performed at multiple taxonomic levels, including order, family, and genus/species, allowing coarse and fine-grained ecological analyzes while accounting for varying levels of visual ambiguity. In the third stage, a multi-object tracking module is applied to high temporal-resolution image sequences and video data to associate detections of the same individual across time. InsectDCT code and all datasets are made publicly available. Author summaryInsects are declining worldwide, creating an urgent need for efficient methods to monitor their abundance, activity, and diversity. Traditional insect surveys often require extensive fieldwork and expert taxonomic identification, which limits the scale and frequency of monitoring. In this study, we developed InsectDCT, an artificial intelligence-based pipeline that automatically detects, classifies, and tracks insects in camera-trap recordings collected from natural and semi-natural environments. Our approach combines deep-learning methods for object detection, hierarchical taxonomic classification, and tracking of individual insect observations through time. Unlike many existing systems that are trained for a single habitat or plant species, we designed our framework using images collected across a wide range of flowering plants, camera systems, and insect groups. This makes the system more transferable to new ecological settings. The classifier can identify insects at multiple taxonomic levels and can return higher-level classifications when species-level identification is uncertain. We demonstrate that the pipeline can process large image datasets efficiently, including on low-power edge-computing devices such as Raspberry Pi systems. By providing both the software and the underlying datasets, we aim to support scalable, non-invasive insect monitoring and facilitate future ecological and conservation research.

5

Standardizing image-derived fish length-frequency distributions to reference measurements using bin-specific error matrices

Shibata, Y.; Iwahara, Y.; Hino, H.; Tsukada, A.; Kisara, Y.; Nishino, T.; Endo, H.

2026-07-06 ecology 10.64898/2026.07.06.736664 medRxiv

Top 0.1%

18.6%

Show abstract

Artificial intelligence (AI)-based image analysis can efficiently estimate fish length, but differences in devices, imaging conditions, operators, and AI models limit comparability among surveys. We propose a standardization framework that estimates a bin-specific error matrix from paired reference measurements and AI-derived lengths and applies it to standardize (correct) AI-derived length-frequency distributions. The Richardson-Lucy expectation-maximization algorithm was used, with the number of iterations selected via cross-validation. Simulations based on empirical length-frequency data from 110 species showed that standardization reduced relative bias and distributional discrepancy; median relative-bias and root mean square error ratios were below 1, and the performance was more affected by the amount of paired data than by the number of cross-validation folds. In real data from 957 Japanese jack mackerel, standardized AI-derived distributions approached human-observer histograms, although discrepancies remained in the range of 160-230 mm. The proposed framework provides a practical approach for improving the comparability of image-derived length-frequency data using paired calibration data, without retraining the underlying AI model.

6

PolliCrop: A high-throughput computer vision pipeline for pollinator monitoring in agroecosystems

Chabert, S.; Bernigaud-Samatan, J.; Blackman, B. K.; Blanchet, N.; Catrice, O.; Donnadieu, C.; Gani, M.; Grousset, R.; Husband, S.; Tueux, G.; Erler, S.; Langlade, N. B.

2026-07-13 animal behavior and cognition 10.64898/2026.07.08.737348 medRxiv

Top 0.2%

13.1%

Show abstract

Flower-visiting insect populations are declining since the 1990s, especially because of the decrease of floral resources in agricultural settings. Mass flowering crops can help increase resource availability, and plant breeding can be directed towards selecting varieties attracting more flower-visiting insects. This requires the implementation of an automated high-throughput phenotyping tool for assessing the attractiveness of plant genotypes to flower-visiting insects. In this study, (i) we present a procedure to take standardized images of sunflower heads with camera traps continuously at day and night in the field; (ii) we trained two versions of a deep learning model, named PolliCrop, to automatically detect and identify three classes of the main insects visiting sunflower on these images (non-Bombus bees, bumble bees, lepidopterans); (iii) we assessed and validated the ability of PolliCrop to correctly predict the true visitation frequencies of the insect classes on three sunflower genotypes; (iv) we presented two statistical approaches to compare the insect visitation frequencies between plant genotypes, one including weather variables, and the other one without. One PolliCrop version yielded satisfying performance to correctly detect the three insect classes. In particular, it correctly predicted the insect visitation frequencies on two sunflower genotypes in a range of {+/-}10%. The other PolliCrop version can be useful in certain contexts of images and objectives. PolliCrop can be extended in the future to other crop species by training PolliCrop on new images captured in these crops. The field experimental design to set up for comparing the attractiveness between genotypes is also discussed.

7

Griphus Software for Multi Panel Figure Composition and Experimentation with Emphasis on Taxonomy

Aguiar, A. P.

2026-07-11 zoology 10.64898/2026.07.07.736512 medRxiv

Top 0.2%

13.1%

Show abstract

The preparation of multi panel figures remains a labor intensive step in scientific publication. Albeit there are specific tools available to solve this problem, they are often highly specialized, difficult to install, or time consuming to learn. Griphus is a standalone graphical application designed for rapid composition and experimentation with multi panel figures, developed by and for zoological taxonomists. Functions specifically designed for multi panel composition include automatic figure numbering and placement, aspect ratio operations, spacers, layout rotation, layout suggestions, and automatic generation of figure legends, including scale bar descriptions. The software can perform both spatial interpretation of images on the canvas and work with a simple, editable layout formula. It also enables instant multi panel composition, with numbered images and automatic contrast selection for the numbers, obtained simply by loading images. User defined parameters such as target printable dimensions, resolution, spacing, and color mode are preserved throughout the work. The program produces coordinated outputs consisting of the final composite figure, a readable file describing the layout structure, and a .gri file storing images, transformations, and parameters for exact regeneration. Griphus is intended as a complementary tool to professional image software, providing a simple and efficient environment for constructing high quality multi panel figures.

8

AnimalTA: A simple yet flexible tool for video tracking and manual corrections.

Chiara, V.; Buatois, A.; Kim, S.-Y.

2026-06-30 animal behavior and cognition 10.64898/2026.06.27.733780 medRxiv

Top 0.2%

13.0%

Show abstract

1. Video-tracking programs have now become an essential tool for researchers measuring animal behavior across biological fields. The panel of available programs is growing rapidly, providing researchers with numerous specific tools that will match their precise needs. However, their proliferation may complicate post-tracking data processing, and some programs do not even provide tools for correcting tracking errors or analysing tracking data. In the case of commercial software, the loss of access to a program due to budget limitations or researchers' mobility from one institution to another could prevent them from accessing and visualizing their tracking data. 2. There is therefore a growing need for an accessible and flexible tool to handle post-tracking processes such as the correction and analysis of tracking data obtained across different video-tracking programs. 3. We present here the latest update of the video tracking and analysis program AnimalTA. With this new release, we propose to solve the above-mentioned problems by providing the scientific community with a program that will allow for data importation from other video-tracking programs. Like in its previous versions, AnimalTA remains a free, open-source, and highly user-friendly program, ensuring that it will always be accessible without restriction. Now, with this new importation option, users who performed their tracking with other programs can benefit from AnimalTA's complete toolset of data visualization, correction, and analysis. 4. Finally, this article gives an overview of the other main improvements associated with this new release. The program is now faster in both video importation and tracking, proposes an amplified toolset for data visualisation and correction, and features new options for data analysis.

9

From Lotka-Volterra Dynamics to Community Assembly: Theory, Topography, and Empirical Applications

Schreiber, S.; Brennan, J.; Spaak, J. W.

2026-07-15 ecology 10.64898/2026.07.14.738515 medRxiv

Top 0.2%

12.7%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWO_LICommunity assembly graphs (CAGs) summarize which species combinations can coexist and how single-species invasions drive transitions between them, encoding the pathways, alternative endpoints, and cycles that make up a communitys assembly history. Constructing CAGs from dynamical models requires methods that are both computationally tractable and faithful to the underlying ecological dynamics. However, existing methods rely on restrictive assumptions, such as global stability, that exclude alternative stable states and non-equilibrium dynamics known to occur in empirical systems. C_LIO_LIWe develop a computational pipeline that constructs CAGs from any generalized Lotka-Volterra model. Building on the invasion graph framework and its connection to permanence, the pipeline verifies that community dynamics are bounded, identifies which subsets of species coexist in the sense of permanence, determines which single-species invasions are dynamically realized, and assigns each community a topographic height equal to the length of the longest assembly path leading to it. We also provide a numerical algorithm to simulate the dynamics of community assembly. C_LIO_LIWe prove several general properties of the resulting graphs, including that a successful invader is never subsequently excluded and that, in the absence of assembly cycles, permanent communities can be reassembled by introducing their species one at a time in the right order. We prove that the CAG faithfully reproduces the compositional shifts seen in the numerically simulated dynamics of assembly. Applying the pipeline to three empirically based models (a New Zealand grassland, a European pasture, and a Puerto Rican ant community), we show how competition strength and mutualistic feedbacks reshape the assembly landscape and how intransitive competition generates assembly cycles. C_LIO_LIOur approach accommodates alternative stable states and non-equilibrium dynamics without requiring global stability, and it turns the long-standing landscape metaphor into a quantitative, mechanistically grounded object by resolving what "height" means. More broadly, it makes the topography of the assembly pathways measurable, providing a way to compare the historical contingency and predictability of the assembly in ecological systems. C_LI

10

DATRASextra: An R package for streamlined workflows with ICES DATRAS bottom-trawl survey data

Mildenberger, T. K.; Maioli, F.; Berg, C. W.

2026-06-30 ecology 10.64898/2026.06.29.735240 medRxiv

Top 0.3%

10.5%

Show abstract

Scientific bottom-trawl surveys provide essential fisheries-independent data for fisheries and ecosystem research. In the Northeast Atlantic, the ICES Database of Trawl Surveys (DATRAS) compiles haul-level information, species- and length-specific catch data, and individual biological observations across multiple long-term surveys. However, reproducible workflows for processing and integrating these relational datasets remain challenging. We present DATRASextra, an open-source R package that provides modular end-to-end workflows for accessing, cleaning, harmonising, quality-controlling, and analysing DATRAS survey data. The package supports derivation of standardised haul-level survey variables, integration of multiple surveys, and generation of analysis-ready datasets for downstream applications including stock assessment, biodiversity analyses, and large-scale synthesis efforts such as FishGlob.

11

Where species distribution models fail under occurrence-data contamination: calibration error concentrates at stream-network headwaters

Miok, K.; Laza, A. V.; Skrlj, B.; Robnik-Sikonja, M.; Parvulescu, L.

2026-07-15 ecology 10.64898/2026.07.14.738364 medRxiv

Top 0.3%

8.7%

Show abstract

Species distribution models (SDMs) increasingly inform conservation and biosecurity decisions in freshwater systems, where the reliability of its uncertainty estimates matters as much as its point predictions. Ensemble SDMs derive prediction intervals from across-replicate variance, but this variance captures systematic error only when replicates disagree about it, an assumption that fails when training data are contaminated with low-accuracy records, the norm in citizen-science datasets. Whether this failure is spatially uniform or concentrates in identifiable parts of a range is unknown. Using a panel of European freshwater crayfish spanning native headwater-associated species and invasive lowland colonizers, we show that contamination-induced calibration failure is strongly spatially structured: it concentrates at stream-network headwaters, the topological tops of the network, where upstream-aggregated predictors are structurally undefined, and scales with contamination severity, replicated across four species and both dominant ensemble protocols (replicate and consensus). The failure is driven by upward prediction bias, not by intervals failing to widen: contaminated ensembles overpredict suitability in headwaters, and because the bias is shared across ensemble members, the intervals do not flag it. This is a conservation-relevant blind spot, because headwaters are both refugia for threatened native crayfish and front lines for invasion; an SDM that silently overpredicts suitability there misdirects survey and management effort toward the segments where its predictions are least trustworthy. Standard leave-one-basin-out conformal calibration, the recommended panel-wide remedy, repairs marginal coverage but leaves headwaters undercovered, because a single calibration threshold is dominated by the abundant non-headwater segments. A group-conditional (Mondrian) variant, calibrating the two populations separately, restores reliable coverage in both at no extra cost and reallocates width where it is needed. We recommend network-position-stratified calibration as a default for ensemble SDMs in dendritic freshwater systems.

12

Application of Machine Learning Tools for Waterbird Colony Monitoring Provides Gains in Precision and Temporal Efficiency

Vallery, A. C.; Kabra, K.; Gibbons, R.; Arnold, H.; Minnich, N.; Barman, A.

2026-07-02 ecology 10.64898/2026.07.01.735369 medRxiv

Top 0.3%

8.2%

Show abstract

Waterbirds serve as important indicators of both aquatic and terrestrial ecosystem health, making effective monitoring essential for tracking population health and identifying potential causes of decline. Drones have provided opportunities to overcome historic waterbird monitoring challenges, but the expertise and time required for manual image analysis creates a major bottleneck. Recent advances in deep learning-based object detection have enabled rapid, automatic detection of features in complex ecological imagery, though applications have largely been limited to single-species colonies, and practitioners lack quantitative comparisons of annotation time and accuracy across different levels of automation. We systematically compared four waterbird monitoring approaches using identical survey areas from Chester Island, a mixed-species colony in Matagorda Bay, Texas, in 2025: (1) traditional ground-based counts, (2) manual drone imagery-based counts, (3) computer-assisted counts using pre-annotations from an object detector with manual human verification (Human+ML), and (4) fully automated counts using object detector annotations (ML-only). We trained a YOLOv10 object detection model on manually annotated imagery of Chester Island in 2021 and applied it to the 2025 imagery. Manual drone annotation detected 6,530 birds in 40.5 hr and served as the primary reference standard. Human+ML detected 5,826 birds (89% of manual) in 7.7 hr, an 81% reduction in annotation time. ML-only detected 5,679 birds (87% of manual) in approximately 46 min, a 98% reduction. Ground counts recorded 5,868 birds (90% of manual). Detection generalized well across species while classification depended heavily on training data and morphological distinctiveness. The Human+ML workflow emerged as a practical middle ground, providing practitioners with empirical data to evaluate partial versus full automation strategies based on monitoring objectives.

13

Intra-African Geographic Domain Shift in Wildlife Camera Trap Species Classification: A Comparative Study of Supervised and Zero-Shot Foundation Models

Nanduri, N.; Ogundare, J.; Anderson, G.

2026-06-25 ecology 10.64898/2026.06.24.734283 medRxiv

Top 0.3%

7.9%

Show abstract

Camera trap networks such as Snapshot Safari have generated millions of labelled wildlife images across Africa, enabling the training of deep learning models for automated species classification. However, deploying models trained in one African region to another remains poorly understood. To the best of our knowledge, this study presents the first systematic evaluation of geographic domain shift within the African continent for wildlife camera trap species classification, using the Machine Learning sub-field of Artificial Intelligence. We use three model architectures, each interacting with Snapshot Serengeti in a different way: BEiTV2is fine-tuned on Serengeti images as a supervised baseline; DINOv2 with FAISS uses Serengeti images as a retrieval index without any weight updates; and BioCLIP is a true zero-shot foundation model that receives no Serengeti training data at all. All three are then evaluated on two Southern African test sets, Snapshot Kgalagadi and Snapshot Kruger, as well as on locally collected wildlife photographs from Botswana. We conduct eight experiments covering in-domain baselines, cross-dataset transfer, data scaling, MegaDetector preprocessing, grayscale vs. colour image conditions, and per-species transfer analysis. This work provides the first empirical characterisation of intra-African domain shift across both supervised and zero-shot architectures, and offers practical guidance for conservation AI practitioners who need to deploy models across the diverse ecosystems of Southern Africa without collecting new labelled data.

14

Confounding effects of inferring gene co-expression networks from pooled data from different biological populations

Runghen, R.; Eliassi-Rad, T.; Bolnick, D. I.

2026-06-29 bioinformatics 10.64898/2026.06.23.734063 medRxiv

Top 0.4%

6.8%

Show abstract

Weighted Gene Co-expression Network Analysis (WGCNA) is routinely applied to pooled datasets from multiple biological populations, genotypes, or treatment groups, implicitly assuming a shared module structure across groups. While the distortion of pairwise correlations by pooling heterogeneous groups is well established statistically, three aspects of this problem have received little systematic attention in the context of co-expression network analysis: the extent to which pooling disrupts the discrete module-level community structure inferred by WGCNA; whether this disruption is detectable from the global topology metrics researchers routinely report; and how prevalent the pooling practice is in published multi-group WGCNA studies. Using analytical toy examples and a four-scenario simulation framework, we address all three questions. Module preservation Zsummary scores declined progressively with between-population divergence, from full preservation under identical populations (mean median Zsummary = 25.2 {+/-} 3.3, 95% interval 19.0--30.7 across 20 simulation replicates) to substantial disruption when both network structure and mean expression differed (mean median Zsummary = 11.9 {+/-} 1.0, 95% interval 10.2--13.5). This disruption was undetectable from global topology metrics: modularity and clustering coefficient remained stable across all scenarios, while edge density was sensitive but non-specific. These findings were corroborated in an empirical reanalysis of divergent lake and stream stickleback transcriptomes, where merged analysis collapsed 26 lake-specific and 59 stream-specific modules into only 19 merged modules. A survey of 100 publications found that 78.7% (95% CI 69.4--87.9%) of multi-group WGCNA studies with sufficient methodological reporting used a single merged analysis. Results were robust across network sizes of 250--1,000 genes and rewiring rates of 10--50%. We provide concrete recommendations including module preservation testing in both directions, population-specific baseline networks, and consensus WGCNA as a principled alternative.

15

TRIDENT (Taxonomic Resolution and IDentification using Environmental dNa Traces): An Optimized Algorithm for Vertebrate Taxonomic Assignments in eDNA Metabarcoding, Integrating Molecular, Taxonomic, and Ecological Criteria

Haderle, R.; Jung, G.; Riou, M.; Ung, V.; Jung, J.-L.

2026-07-09 molecular biology 10.64898/2026.06.29.735257 medRxiv

Top 0.4%

6.7%

Show abstract

Environmental DNA (eDNA) metabarcoding has become a powerful approach for large-scale biodiversity assessment, yet taxonomic assignment remains one of its most critical error-prone steps. Current bioinformatic pipelines rely on molecular similarity searches against reference databases, but assignment accuracy is constrained not only by short marker length and database incompleteness, but also by fundamental limitations, including recent species radiations, incomplete lineage sorting, introgression, NUMTs, and the imperfect correspondence between genetic variation and species boundaries. Here, we present TRIDENT (Taxonomic Resolution and IDentification using Environmental dNa Traces), an automated and simple protocol designed to improve taxonomic assignments in eDNA metabarcoding. Initially developed for marine vertebrates, TRIDENT may be used with any barcode and integrates three complementary sources of evidence: molecular similarity (NCBI/GenBank and BOLD), curated taxonomic information (WoRMS), and ecological plausibility derived from biogeographic occurrence data (GBIF). The workflow sequentially constructs candidate taxon lists based on sequence similarity, expands them through taxonomic hierarchies, and filters them using spatial occurrence constraints. It further identifies possible taxa lacking reference barcodes and evaluates their plausibility through CO1-based similarity if data exist in BOLD. TRIDENT has been implemented as a source-available Python tool and tested using empirical eDNA datasets from marine vertebrates as well as simulated communities. Results demonstrate that the tool produces taxonomic assignments consistent with expert manual curation while substantially reducing processing time and attention errors caused by manual processing of large datasets. By combining molecular, taxonomic, and ecological criteria within a single framework, TRIDENT improves transparency and reproducibility and provides a robust and flexible solution strengthening confidence in taxonomic identifications in eDNA-based biodiversity assessments.

16

How Robust are Multispecies Coalescent Species Delimitations in Taxonomically Complex Systems? A Genomic Assessment Using Mediterranean Tethya Sponges

van der Sprong, J.; Cardone, F.; Hoehna, S.; Schaetzle, S.; Deister, F.; Erpenbeck, D.; Woerheide, G.; Vargas, S.

2026-07-05 evolutionary biology 10.64898/2026.07.04.735074 medRxiv

Top 0.4%

6.1%

Show abstract

Reliable species delimitation underpins biodiversity assessment but remains difficult for organisms with plastic morphology and few diagnostic characters. Multispecies coalescent (MSC) methods can delimit species from genomic data, yet they are rarely tested in taxonomically complex, marine invertebrate groups where they are arguably most needed. We used the three Mediterranean species of the genus Tethya, a rare, well-characterised system within the otherwise taxonomically difficult phylum Porifera-distinguished by multiple independent morphological and ecological characters-to evaluate how robust MSC-based delimitation is in such groups. Analysing 64 single-copy nuclear loci in BEAST2 and BPP, we compared constrained, hypothesis-testing approaches (BFD*, BFdriver, A10) with freer, heuristic ones (SPEEDEMON, A11), and examined their sensitivity to data type, clock model, priors, and the species-collapse threshold. All methods recovered the three recognised Mediterranean species, but the resolution of within-lineage structure was method-dependent. The hypothesis-testing approaches consistently supported six lineages, robustly across data types and model assumptions, whereas the heuristic approaches proved less stable. Configurations without a priori species hypotheses often failed to converge or were computationally intractable, a problem compounded by the relaxed clock. In SPEEDEMON the outcome changed with the collapse threshold. Because our system lacks an independent reference point to calibrate this threshold, any delimitation based on it is poorly constrained. We conclude that constrained, hypothesis-testing delimitation is the most robust and reproducible MSC approach, yielding a quantitative, model-based hypothesis that can be weighed against other lines of evidence to inform taxonomic decisions. By clarifying how these methods behave and how their outcomes should be interpreted, our study offers a practical guide for researchers working on comparably complex systems.

17

simSOMA: a cell-lineage based simulator of the somatic VAF spectrum in plants

Johannes, F.

2026-07-01 genomics 10.64898/2026.06.28.735079 medRxiv

Top 0.4%

6.1%

Show abstract

Plants accumulate somatic mutations during growth, and some of these mutations can spread from local cell lineages into branches, organs, or reproductive tissues. There is growing interest in these variants because they can underlie bud-sport traits in crops, contribute to within-organism somatic selection, and provide genetic variation that may be transmitted vegetatively or sexually to future generations. Recent genomic sequencing of bulk and layer-enriched plant tissues has shown that de novo somatic variants can generate complex variant allele-frequency (VAF) spectra. Interpreting these spectra requires understanding how mutations arising during mitotic cell division are filtered or amplified through shoot growth, branching, and organ formation. Because these processes interact across multiple scales, their combined effects are difficult to derive analytically. Here, we present simSOMA, a modular simulator that links rooted plant topologies to explicit cell-lineage dynamics. simSOMA models somatic mutation accumulation during stem-cell self-renewal in the shoot apical meristem, clonal expansion from the stem-cell niche to the meristem periphery, branch founding, and organ formation. Applying simSOMA across diverse growth scenarios revealed how individual processes can be isolated, varied, and combined to assess their effects on organ-level VAF spectra and among-organ variant sharing. The same simulated spectra can also be transformed to represent bulk or layer-enriched sampling and phased or unphased variant readouts, separating effects of developmental history from those introduced by tissue composition and allele counting. Because simSOMA is organized around modules with defined input-output interfaces, individual developmental components can be replaced or extended as new empirical information becomes available. This makes simSOMA a flexible tool for testing alternative models of somatic mosaicism in plants and for guiding the design and interpretation of VAF-based sequencing studies. The simulator is available at https://github.com/jlab-code/simSOMA.

18

Ecosystem service gradients at protected area borders reveal multiple patterns and prevalent management conflicts

Gonzalez-Garcia, A.; Neyret, M.; Lopez-Tejedor, A.; Prima, M. C.; Si-Moussi, S.; Renaud, J.; Gueguen, M.; Lavorel, S.

2026-07-01 ecology 10.64898/2026.06.30.735453 medRxiv

Top 0.4%

6.0%

Show abstract

Protected areas cannot halt biodiversity loss in isolation; integrating them with surrounding human-dominated landscapes is critical. However, this integration is challenged by substantial landscape heterogeneity at their borders, hindering our understanding of cross-border changes in ecosystem service provision. We introduce a novel framework for characterizing these dynamics by analyzing ecosystem service gradients along protected area borders. For 16 protected areas in the French Alps, we assessed 12 ecosystem services using a mix of established biophysical models and novel connectivity-based models for mobile species. These were aggregated into three stakeholder-driven domains reflecting respectively rural, cultural, and urban management priorities. Automated polynomial regression analysis classified borders into five gradient types. The most common were 'Decreasing Gradients', representing a decline in ecosystem services outside the protected area, and 'Increasing Gradients', with the opposite pattern. Our analysis reveals these patterns are driven by specific landscape configurations, uncovering frequent trade-offs between the three management priorities, where, for instance, landscapes supporting rural priorities often degrade cultural and urban ones. We also identify key opportunities for synergies, by identifying areas where ecosystem services for all three priority domains increase simultaneously outside the protected area. This spatially explicit typology provides a powerful diagnostic tool for designing targeted interventions, such as prioritizing habitat restoration where ecosystem services decline or managing agricultural landscapes to mitigate conflicts across management priorities, supporting a more effective integration of protected areas into the wider landscape.

19

Simulating population pangenomes under coalescent demographic models with MSpangenome

Piat, L.; Denni, S.; Dubois, S.; Linard, B.; Duvaux, L.

2026-07-03 bioinformatics 10.64898/2026.06.29.735168 medRxiv

Top 0.5%

5.3%

Show abstract

Motivation: Pangenome variation graphs (PVGs) are increasingly used to represent genomic diversity, yet there is currently no general framework for generating population pangenomes directly from explicit evolutionary histories. Existing simulators typically focus on individual classes of variation and do not integrate these variations within a genealogy-aware framework driven by explicit demographic histories. As a result, evaluating pangenome methods in realistic population-genetic settings remains challenging, and benchmark datasets with known evolutionary ground truth are scarce. Results: We present MSpangenome, a genealogy-aware frame- work that bridges coalescent population genetic simulations and pangenome graph analyses. The pipeline combines ancestry simulation with msprime and a de novo graph construction algorithm to generate PVGs directly from simulated genealogies. By explicitly modeling recombination, demographic history and incomplete lineage sorting, MSpangenome produces structurally complex pangenomes in which nested and overlapping structural variants emerge naturally from the underlying genealogies, while their evolutionary history and graph topology remain known by construction. This provides a general framework for generating realistic population pangenomes and establishing ground-truth datasets for methodological evaluation. We demonstrate its utility by generating population-scale pangenomes and using them as controlled references to benchmark the widely used graph construction tools, PGGB and Minigraph-Cactus. Our analyses reveal contrasting performance regimes across levels of sequence diversity, sample sizes and classes of structural variation, highlighting the value of simulation-based benchmarking for identifying reconstruction errors that are hard to detect using empirical datasets alone. Availability and implementation: MSpangenome is imple- mented in Python, fully containerized, freely available at https://forge.inrae.fr/pangepop/MSpangepop and mirrored at https://github.com/inrae/MSpangepop.

20

Testing the waters of macrophyte biodiversity with multiscale spatial analysis of public lake monitoring data

Tseitlin, M.; Garcia-Giron, J.; Crabot, J.; Jiang, X.; Larkin, D. J.

2026-06-23 ecology 10.64898/2026.06.22.733670 medRxiv

Top 0.5%

5.3%

Show abstract

Freshwater monitoring programmes like the European Unions Water Framework Directive (WFD) provide a wealth of data on European lake status, including water quality and macrophytes (aquatic plants) as critical habitat features that support health of humans and wildlife. Easier WFD data access can enable external management and research to better safeguard human and natural freshwater use. We demonstrate a replicable workflow to easily download and process multi-year (2007-2024) observations of lake macrophytes (425 sites) and complementary water quality variables (202 sites) from Swedish WFD data. Then, we illustrate the value of improved data access to address ecological questions that drive conservation, investigating how spatial scales influence macrophyte richness and associated water quality relationships using a spatial random intercept model. Decomposing the spatial intercept links small scales (<10 km) to site-level gradients and large scales (>100 km) to biogeographical drivers. Stochastic and environmentally-structured processes coexisted at intermediate scales (10-100 km). Adding water quality rarely improved overall predictive performance of macrophyte diversity models but consistently influences the role of different spatial scales. Water quality variables showed consistent spatially structured variation at intermediate scales and unique spatial patterns in tandem, overlapping with large-scale biogeographical influences. Altogether, we show context-dependencies for spatial model interpretation and provide guidance in accounting for spatial confounding to improve inferential and predictive performance. Our workflow and results show a clear way forward for accessing high-quality macrophyte and water quality data sets and their utility for addressing ecological questions that guide macrophyte protection under the WFD. HighlightsO_LIyears Swedish of macrophyte and water quality monitoring data were extracted. C_LIO_LIrichness showed scale-specific patterns linked to geographic gradients. C_LIO_LIbest predictive models for richness had no water quality at all. C_LIO_LIoverlap in their spatial scales and must be carefully separated. C_LIO_LIpen access data and multiscale analysis can apply to many ecological questions. C_LI